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Abstract 

Graph representations have been widely used to analyze and design various economic, social, military, political, and 
biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical 
treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As- 
Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and 
helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory 
to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and 
the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize 
mathematical properties of the Laplacian eigenvector (|x2) corresponding to the second eigenvalues {X2) associated with 
the topology matrix defining the graph: X2 describes the overall topology, and the sum of \i2's components is zero. The 
three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, 
and largest gap of \x2's components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA 
structures up through 11 vertices (~220 nucleotides). While we observe that the median cut divides a graph into two 
similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the 
gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while 
maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA 
structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. 
In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs. 
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Introduction 

Ribonucleotide Acid (RNA) has become a prominent subject in 
modern biology, due to recent discoveries of RNA's vital roles in 
regulating gene expression, which come in addition to well-known 
roles in protein synthesis [1,2,3]. Based on these new discoveries, 
new applications are being pursued in areas such as therapeutic 
biotechnology, by using RNA's editing, silencing, and other 
regulatory capabilities to activate and deactivate genes, deliver 
drugs, and design new nanomaterials [4,5]. Like other molecules, 
all of these functions of RNA are closely tied to the three- 
dimensional structures that RNAs adopt. Thus, to explore these 
new potential functions of RNA, it is essential to understand the 
principles of RNA's architecture. Such an understanding can 
naturally lead to RNA design as well, another area of intense 
current interest. 

At the heart of RNA structure is its modularity [6,7,8]. RNA's 
diverse structures are generated by the combination of recurrent 
modules on three different levels: sequence (ID), secondary (2D), 



and tertiary (3D) structures. RNA is a single-stranded polymer 
whose sugar-phosphate backbone with contains four primary 
building blocks. Adenine (A), Guanine (G), Uracil (U) and 
Cytosine (C). Modified bases also occur. This single-stranded 
polymer folds upon itself, to form GC, AU, or GU ("wobble") base 
pairs which define double-helical regions ("stems"), imperfect with 
single-stranded regions named "hairpin loops", "internal loops", 
and "junctions", with one, two, or more adjacent helical arms, 
respectively on the 2D (or base-pairing) level. Through other 
interactions in space, complex 3D structures form. Several 3D 
modules called motifs (e.g., coaxial helix, A-minor, ribose zipper, 
kissing hairpin, right-angles, twist-joint and double twist-joints 
motifs) have been identified by manual and computational 
inspection from experimentally resolved structures. 

Such modularity and hierarchy offers us a solid ground for 
conceptual and mathematical methods, such as graph theory, to 
investigate RNA's structural repertoire. Graph theory is a well- 
established field of mathematics, which has been used extensively 
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in a variety of economic, social, engineering, biological, and 
medical contexts to describe and analyze complex networks 
[9,10,11,12]. Essentially, the foundations of graph theory can be 
used to enumerate and analyze combinatorial properties of 
networks [13,14,15,16]. In the field of RNA structure. Waterman 
pioneered the development of the graphical representation of 
RNA primary/secondary network on the base level [17], and 
Shapiro and coworkers extended a tree representation of RNA 
secondar\f network at the base-pair level to measure structural 
similarity [18]. More recently, Schlick and coworkers developed 
the RNA-As-Graph (RAG) framework and web resource (http:// 
www.biomath.nyu.edu/ ma) to represent global RNA topologies as 
graphs (see Figure 1) [19,20]. RAG has been pursued to 
enumerate, analyze, and predict RNA topologies, expanding our 
understanding of RNA's structural repertoire. Interesting applica- 
tions include prediction of RNA-like topologies [20,21,22,23], 
prediction of non-coding RNA [24,25], in silico modeling of the 
in vitro selection process for RNA design [26,27,28], analysis of 
large viral RNA [29,30], and riboswitch analysis and design 
[31,32] (see reviews [33,34]). Recendy, RAG 2D graph represen- 
tations have been extended to 3D, and the substantial reduction in 
conformational space size has been exploited to enhance the 
sampling of 3D topologies to predict helical arrangements of RNA 
[35,36]. 

In RAG, RNA structures are simplified as tree or dual graphs 
by translating RNA modules to graph theory objects such as 
vertices and edges. When helices are translated to edges and other 
modules are translated to vertices, RNA structures are represented 
as tree graphs (see Figure lA). Using graph theory, all possible 2D 
topologies of RNAs can be enumerated [19,20]. 

The Laplacian matrix of a graph provides a quantitative 
framework to describe the topology. In particular, the second 
eigenvalue of the Laplacian matrix has been used as a topological 
descriptor of RNA [20,21]. The enumerated repertoire of tree- 
graph motifs has made possible classification of these motifs as 
existing (experimentally found) and hypothetical. Using the 
Laplacian eigenvalues as variables, we have used clustering 
analysis of RNA graphs to predict which of the remaining 
hypothetical motifs are "RNA-like" [19,20,21]. RNA-like graphs 
have been investigated to design targeted pools for in vitro 
selection [28] and have been merged to design larger RNA-like 
topologies [23]. 

Here, we present another application of graph theory to explore 
the modularity of RNA structures by partitioning RNA graphs 

using graph theory methods. The analysis of RNA structures at 
large has identified modular RNA structures, which are composed 
of repetitive motifs, in which patterns appear hierarchically from 
2D to 3D structural levels [7]. Thus, a build-up of existing modules 
is a natural way to produce new structures by fragment assembly 
approaches, as we have done previously [21,37]. However, so far, 
only a limited number of known motifs have been found by 
manual inspecti(m, and there is no S5 stematic way to divide whole 
RNAs into pieces for fragment assembly. Toward this goal, we 
apply graph partitioning methods to RNA tree graphs. Our 
computational approach divides large RNA structures into small 
recurrent motifs based on spectral graph partitioning. 

The mathematical theory of graph partitioning is well 
developed [38], but the application of the graph partition 
algorithm to biology has not been attempted as far as we are 
aware. Because the topological properties of RNA graphs can be 
described by the second eigenvalue (^2) of the Laplacian matrix, 
we utilize the eigenvector (|x2) of the Laplacian matrix corre- 
sponding to X2. This eigenvector [l2 provides us information on 
how to divide a graph into smaller fragments that minimize 



topological dependencies between fragments. We utilize the zero- 
sum and property of the |x2 elements. We split vertices 1 ton at k 
into two disjointed sets {«;,... ,4} and {ik+i,---,in} by |x2's sorted 
elements {v,7, v,2, ...,Vft,..., v,„} where v,i^v,2...-^v,k ^...<M„, 
and k is determined by a splitting value 5 such that Vik—S. We use 
three standard choices for the splitting value s to define three 
partitioning algorithms: median, sign, and gap. For the median 
and sign cuts, we select the splitting value s as the median of 
eigenvector elements or as 0, respectively, where in the latter 
negative values are separated from positives. For the gap cut, the 
splitting value s is in the largest gap in the sorted list of |x2 
components. We apply these three methods to all existing graph 
topologies discovered experimentally up through 1 1 vertices. Our 
analysis of the 45 RNA graphs from 4 to 1 1 vertices shows that the 
gap cut partitions structures into the most topologically distinct 
pieces. All resulting subgraphs correspond to existing motifs. Thus, 
the gap cut appears the most natural for RNA substructuring. Our 
iterative gap partitioning approach further suggests a systematic 
procedure to divide a large RNA structure into small RNA motifs 
and assemble the resulting modules to large RNAs. Permutations 
of sequences corresponding to the building blocks in the desired 
order could be used to suggest candidate sequences corresponding 
to target motifs. 

This paper is organized as follows. We begin by describing 
methods including mathematical formulation of RNA graphs, 
Laplacian matrices and spectrum, three graph partition algorithms 
(median, sign and gap cuts) based on the second Laplacian 
eigenvector \i2, and the RNA data set that we use. We then 
present results for the topological aspects described by the second 
Laplacian eigenvector, partitioning results for RNA graphs, and 
iterative partitioning results. The final discussion provides future 
directions of the work. 

Methods 

RNA graphs, Laplacian eigenvalues, and eigenvectors 

We represent RNA as a graph G using RAG tree representation 
(see Figure lA). In the RAG tree graphs, RNA 2D structural 
elements - stems, loops, bulges, and junctions - are converted into 
2D graphical objects with the following rules: 

(1) an edge (— ) represents a double-stranded helical stem with 
more than one base pair. 

(2) a vertex (•) represents a single strand that occurs in segments 
connecting 2D structural elements such as bulges, loops, and 
junctions. Here, a bulge motif is considered to be an internal 
symmetric or asymmetric loop with more than one 
unmatched nucleotide or one unstable base pair. 

Figure 1 shows three examples — a linear structure 6_1, a 3-way 
branched structure 6_2, and a 4-way structure 6_5. The RAG 
graph index shows the total number of vertices and the topological 
complexity: RAG graph 6_1, 6_2, and 6_5 have six vertices and 
the subscribed numbers (1, 2, and 5) shows the increased 
complexity. We label the graph vertices by the order of vertices 
corresponding to sequences from a 5' end to a 3' end. 

For a graph G = (F,£) where V is the set of labeled vertices and 
E is the set of edges, we define the associated Laplace matrix 
(called the Laplacian) L = {my) =A—D, where ^ is an adjacency 
matrix and Z) is a diagonal matrix, as the n xn matrix where 
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Figure 1. RNA-As-Graph (RAG) tree representation of RNA secondary structures and graph partitions using the second Laplacian 
eigenvector. (A) Examples of RNA secondary structures, their corresponding RAG tree graphs, Laplacian matrices, and the second eigenvectors: (a) 
signal recognition particle (PDB code: 2XXA, Graph ID: 6_1); (b) glycine riboswitch (PDB code: 30WZ, Graph ID: 6_2); (c) RNase P (PDB code: 3Q1Q, 
Graph ID: 6_5). (B) Graph partitioning using the cut values of the median, sign, or the largest gap of the second Laplacian eigenvector with examples 
of three RAG graphs 6_1, 6_2, and 6_5. 
doi:1 0.1 371 /journal.pone.01 06074.g001 
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Figure 2. Graph partitioning resuits for RNAs corresponding to RAG tree grapKis witii vertex number (a) V = 4, 5, or 6, (b) V = 7, (c) 
V=8, (d) V = 9, (e) V = 10, and (f) V=11. RAG tree graph, Laplacian eigenvector, cut values/results of median/sign/gap partitions, and 
corresponding RNA secondary structures are shown. 
doi:l 0.1 371/journal.pone.01 06074.g002 



idii) if i=j 
- 1 if 
0 if{iJ)^E 

and the value d(i) is the total number of edges from vertex i and n 
is the total number of vertices {n = \ V\). See Figure lA for the 
Laplacian matrices for three graph examples. 

Since the Laplacian matrix L is symmetric, the eigenvalues of 
the Laplacian matrix (XI, X2, ...Xn) are non-negative with the 
smallest eigenvalue XI =0. In the field of the spectral graph 
theory, the Laplacian eigenvalues and eigenvectors have been 
extensively studied since they describe combinatorial or topolog- 
ical properties of a graph. In particular, the second smallest 
eigenvalue, X2, and its corresponding eigenvector, \i2, provide 
information about topological compactness and the partitioning 
properties of graphs, respectively. The second eigenvalue X2 is 
positive for any connected graph and increases with the 
compactness of a graph (see Figure lA). The second eigenvector 
(i,2 = (vi,V2,...,v„) corresponding to X2 provides local information 
on the connectivity of each vertex; it also provides information on 
how to subdivide a large RNA into smaller fragments that 
minimize dependencies between fragments (see below for details). 

n 

Note that the sum of elements of ^2 is 0 (X] i', = 0) because 

i=l 

vector 10,1 = (1,1,..., 1) is an eigenvector of the Laplacian corre- 
sponding to the first eigenvalue (XI =0) and the eigenvectors of a 
symmetric matrix are orthogonal, i.e., the inner product of 
(1,1,. ..,1) and {vi,V2,...,v„) is zero. 

Graph partitioning algorithm 

The basic idea in graph partitioning is to split the vertex set V 
into two disjointed sets, A= {ieV : v,>s} and A = {ieV : Vi<s}, 
for some given value s. Based on the zero-sum property of the 
second Laplacian eigenvector |a,2, a spectral partitioning technique 
can be used to split the graph according to a splitting value s on the 
basis of the second eigenvector [i2, as elaborated by Spiegelman 
and Teng on the spectral partitioning technique [38]. We use 
three standard choices for the splitting value s: median, sign, and 
gap cuts. For the median cut, we select s as the median of 
{vi,V2,...,v„}, namely m, and thus the two sets are {ieV : v, >ot} 
and {ieV : Vi<m}. For the sign cut, we select i = 0 and divide V 
into {ieV : v, >0} and {ieV : v, <0}. For the gap cut, we list the 
eigenvector components {vl,V2^■••,v„} in ascending order 
{v,i,v,2,...,v,„}, where v,i < v,2 < ■■• < Vi„; we then calculate the 
difference of every two neighboring elements 
(kii -f,-2|,|v,-2-v,3|,...,|v,(„_i)-v,„|), called the "gap", and select 
s to be the value defining the largest gap: If |Vy — v,(/+i)| is the 
largest gap, then {va,Vi2,...,v,y} and {v,(,+i),v,(,+2),...,v,„} are two 
disjointed set of vertices by the gap cut. Furthermore, we apply the 
graph-partitioning algorithm iteratively until all substructured 
graphs correspond to existing RNAs (the simplest motifs have all 
structures experimentally deduced). After one partition, a graph G 
becomes two subgraphs, namely, Gi and G2 after one partition. 
After k iterations, G becomes Gj, G2,..., GJ" until all G; 
correspond to some existing RNA graphs. 



RNA data set 

We apply the graph partitioning algorithms to 45 RAG tree 
graphs with 4 to 1 1 vertices that have corresponding known 
structures in the Protein Data Bank (PDB) database (http://www. 
biomath.nyu.edu/ rna, N. Baba et al. in preparation). Note that a 
graph topology can correspond to multiple RNAs. See Figure 2 
for the list of RAG tree graphs and their corresponding secondary 
structures and partition results. 

Results 

Topological aspects described by the second Laplacian 
eigenvector 

Before describing results of our RNA partitioning, it is useful to 
understand the overall topologies of RNA graphs. The second 
eigenvalue {X2) and eigenvector (|i2) of the Laplacian matrix 
constructed from the adjacency and degree matrices of each graph 
describe overall graph connectivity. The eigenvalue X2 measures 
the graph complexity. For example, Figure lA shows that a linear 
RNA molecule has a smaller X2 value than a branched molecule: a 
linear graph 6_1 has A-2 = 0.27, while branched graphs 6_2 and 
6_5 have X2 = 0.32 and X2 = 0.49, respectively. 

The corresponding eigenvector |j,2 describes the local topolog- 
ical aspect of a graph. Thus, .symmetry produces same numerical 
components. For example. Figure lA shows that for Graph 6_1, a 
simple symmetric linear structure with 6 vertices, [l2 elements 
increase from —0.56 to 0.56 and have symmetry around 0 
({-0.56, -0.41, -0.15, 0.15, 0.41, 0.56}). For Graph 6_2 witii a 
3-way branch connected with a linear structure, |i2 is {—0.42, 
-0.28, -0.42, 0.08, 0.42, 0.62}. For the 3-way structure, the 
elements are {-0.42, -0.28, -0.42, 0.08}, where the center of 
the branch corresponds to —0.28, and the symmetric branches 
correspond to —0.42. The last branch connected to a linear 
structure is 0.08 with two more vertices with increased elements of 
0.42 and 0.62. Graph 6_5, a 4-way branch connected with an 
elongated edge, has ^2 = {0.32, 0.16, 0.32, -0.38, -0.73, 0.32}. 
The vertices 1, 3, and 6, which correspond to the 4-way branch, 
have the same element value as 0.32. The central vertex 2 has the 
value 0.16 and the linearly connected vertices have decreasing 
negative values —0.38 and —0.73. Since local topological features 
are reflected by the components of \1l2, the ordering of elements 
can be used to partition graphs. 

Topological aspects of subgraphs partitioned by the 
median, sign, and gap cuts 

From Figure IB, we see that the partitioning results for the 
three methods are overall similar for three sample graphs but difler 
in fine details. Figure IB shows three examples of graphs - RAG 
graphs 6_1, 6_2, and 6_5 with partitioning by the median, sign, 
and gap cuts. For these three graphs, the median cut partitions 
graphs into two subgraphs with the same number of vertices; the 
gap cut partitions the graphs into two topologically distinct graphs; 
and the sign cut results are same as either the median cut or the 
gap cut, depending on the graph topology. For the linear structure 
6_1, all three partitioning have the same result: the linear graph 
6_1 is divided into two linear structures with the same vertex 
numbers ({ l,2,3}/{4,5,6}). For the 3-way branched graph 6_2, 
the median and sign cuts have the same result, with two vertex sets 
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with the same number of three vertices ({1,2,3}/ {4,5,6}). The gap 
cut, however, produces the graph into two distinct topological 
features: a 3-way branch with four vertices ({1,2,3,4}) and a linear 
structure with two vertices ({5,6}). For a 4-way branch graph 6_5, 
the median method partitions the graph into two three-vertex 



graphs while the sign and gap methods cut the graph into a 4-way 
branch ({1,2,3,6}) and a linear structure with two vertices ({4,5}). 

Gap cut: best partitioning algorithm for RNA graphs 

These three partitioning methods applied to 45 RAG graphs 
from 4 through 1 1 vertices corresponding to existing RNAs (whose 
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experimental structures have been deposited in the PDB database) 
produce the results shown in Figure 2. Here, RAG tree graphs, |i2, 
the cut values, and results for the small (4—6 vertices. Figure 2a), 
medium (7-8 vertices, Figures 2b, 2c), and large (9-11 vertices. 
Figures 2d-f) graphs are shown. The overall cut patterns are 
similar as analyzed for the simple examples above. Namely, the 
median cut partitions the graph into two graphs with the same 
number of vertices; the gap cut partitions the graph into two 
topologically distinct graphs regardless of the vertex numbers of 
the partitioned graphs; and the sign cuts coincide with either the 
median or gap cuts but usually the gap cut. As a result, among 45 
existing RNA graphs, 28 cases yield the same results for all three 
methods. For 13 cases (e.g., graphs 6_5, 8_10, 8_14, 9_4, 10_30, 
1 1_89), the sign and gap cuts produce the same partitioning of the 
graphs as the high-branched graph plus the linear graphs, while 
the median cut breaks the branched graphs corresponding to 
junctions. For four cases, graphs 4_2, 6_2, 10_2, and 10_4, the 
gap cut keeps the 3-way junction structure and cuts the connected 
hairpin or internal loop structures. On the other hand, the sign 
and median cuts have the same partitioning results: junctions are 
br()k("n to obtain t^vo graphs with equal vertex numbers. Thus, 
among the three partitioning methods, the gap cut method is well 
suited for partitioning RNA graphs into basic modules of internal 
loops, junctions, and hairpins without breaking them, which is 
usually energetically favorable. 

Iterative gap cuts to partition large RNA 

For large RNAs, we apply the gap partitioning iteratively to 
analyze how basic modules are assembled to make larger ones. For 
example, we partition graph 10_19 corresponding to the glycine 
riboswitch (Figure 3), which has two 3-way junctions. After one 
step of gap partitioning, the two graphs, 5_2 and 6_2, connected 
by the vertex ID 1 result (for the Laplacian eigenvector elements of 
RAG ID 10_19, see Figure 2e). The subgraphs 5_2 and 6_2 
contain one 3-way junction, and the vertex 1 corresponds to the 
RNA single strand connecting these two 3-way junctions. After 
two steps of gap partitioning applied to the second-generation 
subgraphs 5_2 and 6_2, four third-generation-graphs result: 5_2 
becomes 2_1 and 4_2, and 6_2 becomes 5_2 and 2_1. See 
Figure 2b for the Laplacian eigenvector elements for graphs 5_2 
and 6_2. After the third partitioning of 5_2, we produce five 
minimal modules including two 3-way junctions and three hairpin 
structures to form the graph 10_19 (Graphs 2_1, 4_2, 4_2, 2_1, 
and 2_1). 

Such an iterative partitioning of gap cuts also suggests 
hierarchical assembly procedures from the basic modules building 
up to a large structure by an inverse cutting procedure. For 
example, to build up an RNA structure of 1 0_ 1 9, we start with five 
graphs with vertex labels in the fourth row in Figure 3: 2_1 (vertex 
labels {3,4}), 4_2 (vertex labels {1,2,3,5}), 4_2 (vertex labels 
{1,6,7,8}), 2_1 (vertex labels {8,9}), and 2_1 (vertex labels {9,10}). 
The first step of assembly suggested by the iterative gap cut 
procedure is a combination of graphs 4_2 (vertex labels {1,6,7,8}) 
and 2_1 (vertex labels {8,9}). From the overlapped vertex labels, 
the connection point is also suggested: graphs 4_2 (three-way 
junction) and 2_1 (hairpin loop and dangling end) can be merged 
through vertex ID 8 to form graph 5_2 (three-way junction and 
internal loop). The next step suggests assemblies of 2_1 and 4_2 to 
5_2 and 5_2 and 2_1 to 6_2, and the final step is assembly of 5_2 
and 6_2, to yield the target graph 10_19. This gap cut/ assembly 
procedure indicates that the first step in RNA structure assembly is 
the connection of junctions to other structures. 



Discussion and Conclusions 

We have presented graph-partitioning approaches using three 
splitting values - median, sign, and gap - based on the Laplacian 
eigenvector |j,2 for understanding modular features of RNA. Our 
application of these spectral algorithms to all 45 existing RNA 
graphs from 4 through 1 1 vertices (http://www.biomath.nyu.edu/ 
rna, N. Baba et al. in preparation) has suggested concrete 
substructuring as well as design procedures. Namely, sequences 
corresponding to these basic motifs can be combined in a build-up 
type optimization we have done for RNAs [21]. Our RAG 
database has already catalogued both existing RNA topologies and 
hypothetical RNA graphs, and further classified the latter class 
into RNA-like or non-RNA-Kke topologies. For example, there are 
42, 100, and 227 graphs with 9, 10, and 1 1-vertices, respectively, 
which do not have corresponding existing RNAs but are 
considered RNA-like (or probable) (N. Baba et al, in preparation). 
For these RNA-like motifs, a build-up procedure is a reasonable 
approach. 

The RAG tree graph representation of RNA structures captures 
the global helical connectivity and offers mathematical founda- 
tions to measure and analyze RNA topologies. While the second 
Laplacian eigenvalue X2 describes the overall compactness of the 
graph, the elements of second Laplacian eigenvector [l2 describe 
the topological contributions/locations of each vertex to the 
overall motif For example, a linear structure has simply increasing 
or decreasing |a,2 elements depending on how the vertices are 
labeled (see Graphs 4_1, 5_1, 6_1 in Figure 2a, Graph 7_1 in 
Figure 2b). In a branched structure, the vertices of branch ends 
have the same [l2 components (for example, see Vertex ID 1 and 3 
of Graph ID 5_2 and 6_2 in Figure 2a). 

Thus, partitioning vertices into two groups by the gap cut 
algorithm provides a mathematically reasonable and biologically 
relevant graph partitioning result. The median cut partitions a 
graph into two equal-sized subgraphs. The largest gap cut 
partitions a graph into two topologically distinct subgraphs. For 
example, graph 6_2 in Figure 2a has two 3-vertex subgraphs, but 
the gap cut produces one 4-vertex subgraph with a junction 
structure and another 2-vertex subgraph plus a linear structure. 

Since junction structures cannot easily be divided energetically, 
the gap cut suggests a quantitative and systematic approach to 
describe basic modules and their hierarchical assembly. As shown 
in the example of the graph of 10_19 in Figure 3, our algorithm 
suggests how a large RNA can be built from RNA building blocks. 
The gap partitioning can be utihzed to design RNA sequences that 
fold into the target graph, which could help to expand the 
structural repertoire of RNAs. 

Our prior design work by build-up has used simple division of 
ten dual graphs to predict novel motifs [21]. Already half of them 
were experimentally determined by different methods such as X- 
ray crystallography, nuclear magnetic resonance spectroscopy, or 
comparative analysis of genomic sequences [20]. We further 
observed high sequence similarit)^ between designed and actual 
RNA sequences, much greater than expected by chance (25%) 
[20]. However, such an ad-hoc buUd-up procedure cannot be used 
to design complex pseudoknot structures systematically [20]. 

The application of our partitioning approach for RNA tree 
graphs to RNA design is limited to pseudoknot-free structures. 
Since pseudoknot motifs are important for general RNA 
applications, build-up approaches based on partitioning algo- 
rithms for dual graphs are required. In particular, dealing with 
self-loops (edges connecting the same vertex) requires further 
development. It may also be possible to approach pseudoknot 
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partitioning using modified tree graphs that have additional edges 
or graph elements accounting for pseudoknot interactions. 

Our work here is the first application of standard graph 
partitioning algorithms to the area of biology. The partitioning 
procedures can be extended to the design of novel RNA motifs by 
assembling modules corresponding to the subgraphs identified 
from partitioning, as we have done previously for ten specific 
motifs [21]. Of course, topology is just one aspect. Chemistry and 
thermodynamics must be considered as well. For example, free 
energy calculations and energy landscape analyses are further 
needed to screen in-silico designed sequences that fold onto the 
target topologies. In our analysis of RNA riboswitch energy 
landscapes, for example, we found the distribution and barrier 
type of the conformational clusters in riboswitch energy landscapes 
to be useful for discriminating various riboswich classes depending 
on the thermodynamic control of ligand-binding [31,32]. 

A generalization of RNA graph partitioning to other areas of 
biology can certainly be envisioned. Many biological networks, 
such as metabolic pathways associated with biochemical reactions 
and regulatory protein interaction networks, have been construct- 
ed by organizing building blocks. Thus, the application of graph 
partitioning to these systems could define a valuable tool to 
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